DOCUMENT RESUME 



ED 324 349 



TM 015 594 



AUTHOR 
TITLE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FROM 

PUB TYPE 

JOURNAL CIT 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Coiey, Richard J., Ed. 

Testing. ETS Policy Notes, Volume 2, Number 3. 
Educational Testing Service, Princeton, NJ. Policy 
Information Center. 
Aug 90 
13p. 

ETS Policy Information Center (04-R) , Rosedale Road, 
Princeton, NJ 08541. 

Collected Works - Serials (022) — Reports - 
Evaluative/Feasibility (142) 
ETS Policy Notes; v2 n3 Aug 1990 

MF01/PC01 Plus Postage. 

Accountability; *Achjevement Tests; Constructed 
Response; * Educational Assessment; Educational 
Change; Educational Policy; Educational Research; 
Elementary Secondary Education; Essay Tests; 
Mathematics Tests; National Programs; Performance; 
Science Tests; Standardized Tests; *State Programs; 
Testing Problems; *Testing Programs; Writing Tests 
^Connecticut ; Performance Based Evaluation 



ABSTRACT 

Three articles on current research in testing are 
presented. The first article, "Testing in the Schools", discusses the 
role of testinq in educational reform. In the 1980s, the overwhelming 
purpose of state standardized testing has become promoting 
accountability in areas of: (1) monitoring; (2) gatekeeping; (3) 
remediation; and (4) funds distribution. Educational policy makers 
need to find some way to evaluate the tests. Some guideposts for 
evaluating testing are suggested: making sure that instructional 
outcomes and learning outcomes guide the testing; determining how a 
test protects against bias in race, gender, or ethnicity; making sure 
that appropriate techniques are used; and making testing for 
accountability less obtrusive. The second article, "Constructed 
Response Testing: Some Development Efforts", examines two approaches 
focusing on student-developed solutions to questions that can be 
economically scored. The first approach involves the use of an answer 
grid to record answers to mathematical questions, and the second 
approach involves the use of figural response items for. science 
testing. The third article, "Assessing Performance", describes some 
of the work conducted at the Educational Testing Service (ETS) and in 
Connecticut in the area of student performance assessment. The 
"Learning by Doing" project of the National Assessment of Educational 
Progress is described. Also discussed are: a writing portfolio study; 
the Arts PROPEL program in Pittsburgh (Pennsylvania); and 
Connecticut^ Common Core of Learning Assessment Project. Four 
figures illustrate the discussions. (SLD) 



********************************************************************* 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 



fe>tfElb PULIUY INUIbb 

News from the ETS Policy Information Center 



(2^olume2 Number 3 



Educational Testing Service Princeton, New Jersey 



August 1990 



As we begin the 1990s, it is clear 
that public education and large- 
scale standardized testing have 
become interdependent Accord 
ing to a recent report of the 
National Commission on Testing 
and Public Policy, each year 
elementary and secondary school 
students take 127 million standard 
ized tests mandated by states and 
districts, and 20 million school days 
are devoted to such testing Th'S 
averages to about three tests per 
year per student 

It is not surprising, with such 
widespread activity, that the 
desirability and effects of stan- 
dardized testing are under scru- 
tiny A number of different con- 
cerns about testing have 
emerged, making it difficult for 
education officials, policy makers, 
and the public to achieve focus in 
discussion and debate One of the 
most fundamental concerns is 
determining the relationship 
between classroom ms f ' jction 
and standardized tests 

A conventional view is that 
educators decide what to teach 
and u?^ a test to see if students 
have learned what was supposed 
to be taught The alternative is to 
have the test shape instruction 
The latter view has been tne basis 

CERJC 



US DEPARTMENT OF EDUCATION 

e of t Juiationd ««• search and Improvement 

t Durational hlsources information 

CENTER (EPiC) 
dolmen! nas been fepfOduced as 
ece.ved f»om trie pef$on o' o'ganirauon 
originating ij 

Mmo' changes nave been made \o tmprOve 
reproduction Quality 

P of vie* Or 0P'"'OnS Slal?d 'n ♦'MSdOCu 

rnert do Mot necessary repfesem official 
Of Ri position or potn > 



for much of the educational 
reform during the 70s and early- 
to mid '80s, state-mandated tests 
often evolved from this vision of 
"tests" as a method for controlling 
what goes on in the classroom. 

Recently, however, arguments 
have been advanced that the 
state regulatory approach is too 
limited, or too centralized, and 
that nothing less than total "re- 
structuring" is now necessary As 
the century closes, testing itself will 
be tested for its ability to turn 
around an educational system 
rated among the lowest perform- 
ing in the industrialized world 



It is important to understand 
how standardized testing became 
the focal point of educational 
reform During the 1970s, Ameri- 
cans perceived a decline in 
educational standards and, 
consequently, demanded a return 
to basics. These demands fueled a 
"minimum competency testing" 
movement, particularly in reading 
and mathematics in their 1982 
study, Measuring the Quality of 
Education, Willard Wirtz and 
Archie Lapomte reported that 



PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER <ERlC) 



Ihirty-nine states adopted 

minimum competency testing 
programs Standardized tests 
were either developed by state 
agencies or obtained from 
commercial publishers In many 
cases, specific scores on the 
examinations were set as 
marking the lowest levels of 
competency that could be 
considered acceptable 

The education system re- 
sponded to fhe demand for 
establishing minimum competes 
cies and succeeded in raising test 
scores, particularly in lower 
performance groups By 1983, 
however, it was "excellence" that 
was being demanded, not "mini- 
mum competence " In A Nation 
At Risk, the National Commission 
on Excellence in Education stated 
"'Minimum competency' 
examinations faii short of what is 

This Issue: 
Testing 



One 
Two 



Three 



Testing in the Schools 
Constructed Response 
Testing Some 
Development Efforts 
Assessing Performance 



needed, as the minimum tends 
to become the maximum , thus 
lowering educational standards 
for all " 

The Commission set a new 
standard/'Excellence character- 
izes a school or college that sets 
high expectations and goals for all 
learners, then tries in every way 
possible to help students reach 
them " 

The 1970's wave of standardized 
testing had been tested and 
found wanting Although testing 
activity intensified with the issu- 
ance of A Nation at Risk in 1983 
mass testing for purposes of 
school, district and state account- 
ability was not predicted or 
prescribed by that report Rather, 
the report recommended the use 
of "standardized tes*s of achieve- 
ment ' for three purposes 

1) to certify the studeni s 
credentials 

2) to identify the need for 
remedial intervention 

3) to identify the opportunity 
for advanced or accelerated 
work 

All three purposes relate to the 
individual student the first in terms 
of assessing achievement and the 
other two as aids for determining 
the proper course of instruction 

The great majority of states 
alreaay had statewide testing 
programs when A Nation at Risk 
was issued Five years later, in 
1989-90, 47 states required that 
local school districts test public 
school students at some point or 
points between grades 1 and 12 
This represented an increase of 
only five states from 1984-85, but 
during that period many states 
broadened their testing programs 

* Eleven added new grade levels 
to be tested, including pre- 
kmdergarten and pre first grade 



* Six added science and social 
studies to their testing program, 
and many more added writing, 
especially essays, to replace 
multiple choice exams in lan- 
guage arts. 

* Two states moved from testing 
representative samples of 
students in a grade to testing ail 
the students. 

© Three states switched from 
allowing local school districts to 
choose their tests to mandating 
the use of a state-selected 
instrument 

While there are a great many 
distinctions in the scope and 
purpose of various state testing 
programs, they can be roughly 
classified into four categories (see 
Figure 1) The overwhelming 
purpose of state testing programs 
is to promote accountability and 
this purpose falls squarely within 
the regulatory approach to school 
leform of the 1980s, 38 of the 47 
stales use tests for monitoriny 

Figure 1 

State Testing Programs and Purposes. 1990 



school and/or district performance. 

Twenty-three states require tests 
for grade promotion or high 
school graduation; another 20 
employ testing programs to 
identify students in need of 
remediation; and nine states use 
them in decisions about the 
distribution of funds 

f ic ^ T e$f$ Get Used 

Of course, from the available 
data, there is no way to determine 
how the testing programs are 
used in the day-to-day practice of 
education While some uses are 
obvious, such as to allocate 
monetary rewards and penalties 
to specific schools or to enforce 
high school graduation require- 
ments, others are more difficult to 
identify. 

When tests are used to monitor 
education, who is looking at 
scores, with how much attention, 
and with what result? When they 
are used in remedial programs, 



Number ol State 
Testing Programs 



Purpose 



Monitoring 





Funds Distribution \? f&K&t 



20 30 
Number of States 



ERIC 



3 



how many students actually 
receive different instruction as a 
result of the tests they have taken, 
and how \s the choice of remedial 
program guided by the tesP Are 
teachers and schools simply trying 
to score better on the tests by 
narrowing instruction, or are they 
responding by doing a better job 
of instruction^ While there has 
been massive testing, there has 
been minimal study of what 
actually happens to students, 
teachers, and schools as a result 
of this testing. 

In the mid-80$, however, the 
Center for the Study of Evaluation 
at University of California at Los 
Angeles did conduct a national 
level study dealing with the uses of 
testing (of all kinds) in the deci- 
sions and judgments teachers 
make The study reported these 
findings 

(the survey results demon- 
strate that) teachers do use 
test results of various types in 
making common instructional 
decisions They also reveal 
quite clearly, however, that 
teachers place greatest trust 
in their own observations of 
students' class performance 
and in their personal, clinical 
judgment Nearly every 
teacher reporting says that 
their own observations and 
students' classroom work' are 
crucial or important sources 
of information for initially 
grouping or placing students, 
in deciding to change 
students' placement or 
grouping, and in determining 
students' report-card grades 
The majority also give heavy 
weight to the results of their 
own, self-constructed tests in 
each of these tasks 

Other studies show that teachers 
do not use the reams of tesi results 
they get from the tests that school 



systems buy This is the kind of 
research into actual test use that 
should be brought to bear on 
testing decisions and controversies 
but seldom is. 

a Babe* o* issuers 

The discussion about testing 
lacks focus due to a myriad of 
issues being addressed by differ- 
ent people with different objec- 
tives The range of concerns 
includes 

* The ability of present forms of 
standardized tests to capture 
thinking and problem solving 
skills 

« The appropriate uses of tests to 
promote school accountability 

* Sole reliance on tests in decision 
making 

o Concern about the adequacy 
of the "normmg" samples 
frequently used 

* Use of too much instructional 
time for testing 

* Sole reliance on multiple-choice 
testing formats 

* The proper fit between what is 
tested and what is taught 

* The appropriateness of stressing 
the teaching of test-taking skills, 
"aligning" the curriculum, or 
actually teaching the test items 

* Race, ethnic, or gender dis- 
crimination 

These issues regarding testing 
within the K- 1 2 education system 
differ in some respects from those 
in admissions testing, where 
predictive validity is a key matter 
in judging test quality and appro 
pnateness, 

The complications arising from 
this 'Babel'' of issues are corn 
pounded by the strong emotions 



that resting often engenders 
because of personal experiences 
and the critical ways tests some- 
times shape people's lives 

At the same time that the 
debate is carried out on shifting 
ground, the standardized testing 
business is highly technical, with its 
historical grounding in psychology 
and statistics Few engaged in the 
debate understand such esoteric 
terminology as item response 
theory, domains, questions of 
dimensionality, and norm and 
criterion referencing, to name a 
few And common terms such as 
validity and reliability have taken 
on very technical meaning in 
testing Now, the terms "authentic 
assessment" and "performance 
testing" are entering the discus- 
sion 

A few Gvj'dODO$t$» 

It is amidst this bewilderment 
and confusion that educators, 
policy makers, and the public 
must form opinions and make 
decisions that will affect the future 
of our education system. These 
are decisions that should not be 
delegated to technicians. It is 
becoming more and more appar- 
ent tnat to make critical choices 
about elementary and secondary 
education, we need to find or 
develop methods to "test" the 
testing programs Unfortunately, 
tests are used for so many differ- 
ent purposes in the education 
system that no single set of rules 
will suffice There is now emerging, 
however, Sonne consensus within 
what might be called "the educa- 
tional testing reform movement" 
as to the general directions testing 
needs to go if educational objec 
fives are to be achieved. While 
the guideposts" suggested below 
are this author's formulation, they 



4 



3 



are believed to parallel what this 
reform movement is generally 
saying (An additional source of 
guidance for eaucators, policy 
makers, and others is The Code of 
Fair Testing Practices in Education, 
issued by the American Educa- 
tional Research Association, the 
National Council on Measurement 
in Education, and the American 
Psychological Association and 
publicly endorsed by some lead- 
ing publishers of tests, including 
ETS) 

Make Sure that Instructional 
Objectives ana Learning Out 
comes Guide the Jesting 

Standardized achievement 
tests have no intrinsic value in 
education other than as 
measures of whether mstruc 
tion has had it? intended 
effects The educational 
enterprise determines what 
students should be taught 
and what they should know 
Make the tests fit the mstruc 
tional goals and strategies 
not vice versa Every test 
embodies an implicit theory 
of how learning does or 
should occur Make sure that 
theory is known 

Know How a Test Protects Against 
Bias in Terms of Race, Ethnicity, 
and Gender 

Test scores should reflect 
what students learn Test 
constructors must guard 
against scores being af- 
fected by characteristics 
extraneous to instructional 
objectives and outcomes 
How does the test develop- 
ment process guard against 
bias? After being satisfied as 
to cultural fairness, use score 
differences among groups of 



students to determine where 
to focus more instructional 
attention. 

Make Sure that the Techniques of % 
Testing Are Appropriate for 
Measuring Desired Knowledge 
and Skills 

This issue has come to the 
fore recently with criticism of 
sole reliance on multiple- 
choice formats in standard- 
ized testing. Terms such as 
"authentic,/'' "constructed 
response/' and "perfor- 
mance" testing are appear- 
ing more frequently. The 
debate aoout the effects of 
multiple-choice test formats 
still requires substantial re- 
search about the skills it is 
best suited to measure We 
need to know more about 
how much difference there 
is, in terms of what is mea- 
sured, between choosing 
among answers and con 
structing them. Some re- 
search on this matter is 
available, much of it con 
ducted by researchers at 
Educational Testing Service 
More knowledge is needed 
about the best applications 
for a variety of testing for- 
r ts, as is more effort to 
establish the measurement 
characteristics of perfor- 
mance tests Serious desire 
for open-ended questions 
and performance measures 
must be matched by serious 
attention to the time and 
resource implications that are 
involved 

Make Testing for Accountability 
Less Intrusive 

Every ten years the nation 
conducts a consus Its regular 



information comes from 
carefully constructed na- 
tional samp*, s of households, 
carried out by carefully 
trained interviewers, using 
instruments carefully devel- 
oped and tested over many 
years This is also the means 
by which the widely re- 
spected National Assessment 
of Educational Progress 
gathers its data about 
achievement. This type of 
tmpling system could be 
u 0 ed for accountability 
testing. It would intrude less 
on valuable class time and 
would not interfere as much 
with legitimate instructional 
objectives determined by 
schools and districts. The use 
of sampling systems for 
evaluation also would permit 
a lot more flexibility to intro- 
duce formats other than 
multiple-choice. Of course, 
while this an be done to 
evaluate the performance of 
the system, only individual 
testing can be used to satisfy 
a high school graduation 
requirement or to inform an 
instruct'onal decision about 
an individual student. 

Last February President Bush and 
the nation's governors announced 
National Goals for Education, to 
be achieved by the year 2000. 
They are very ambitious goals. 
Testing and assessment are inter- 
woven with educational practice 
and management, and comple- 
mentary goals will need to be 
established. To be useful, such 
goals will have to emerge from a 
very thoughtful process, one that 
is informed by relevant research 
and analysis of existing programs, 
not by the slogans arising from 
heated debates about testing, w 



ERLC 



5 



Constructed Response Testing: Some Development Efforts 



There is a growing view that 
'tests In elementary and secondary 
schools should not rely so heavily 
on multiple-choice formats, but 
there <s also the reality that mul- 
tiple choice testing has an incom- 
parable advantage from a cost 
standpo-.it the answer sheets can 
be machine scored. In complete 
open-ended question formats, 
each paper has to be graded by 
a human being and under care- 
fully controlled conditions that 
assure ihat uniform scoring stan- 
dards are applied. Two separate 
development efforts at Educa- 
tional Testing Service have ex- 
plored approaches that focus on 
student-developed solutions to 
questions that can be economi- 
cally scored The first involves the 
use of a "grid" for recoramg 
answers to mathematics ques- 
tions The second uses "figurai 
response items" in science, where 
the examinee is called upon to 
complete a partially completed 
figure, or otherwise indicate 
something on a drawing or graph 
These examples are among a 
considerable number of new 
approaches to testing formats 
being developed at ETS 



ETS has developed and tested 
two prototypes in which multiple 
choice mathematics questions 
were converted, so that the 
answer could be recorded in a 
grid that can still be machine 
scored In one prototype, the 
multiple-choice and the grid 
versions were each given to 
equivalent samples of over 900 
high school juniors and seniors 
(see Figure 2 for an example from 
eoch test) 



Figure 2 



The Question: 

Section I of a certain theater contains 12 rows of 15 seats each. 
Section II contains 10 rows, but has the same total number of seats as 
Section I. If each row in Section II contains the same number of seats, 
how many seats are in each row? 



Test 1 , Multiple Test 2, Grid 

Choice Version Version 



(A) 16 

(B) 17 

(C) 18* 

(D) 19 

(E) 20 





/ 


6 


® 




® 


CD 




CD 


(?) 


® 


(?) 




® 


® 


0 


CO 


® 


® 




® 


® 


® 


® 


© 


00 


(I) 


(8) 


(!) 


4 


'.9) 




® 



In comparing the results, E1S 
found that the grid questions 
"worked well" from a test con- 
struction point of view, and that 
the results were not differentially 
related to gender However, the 
'griddmg in" test was harder the 
average percent correct for the 
multiple-choice version was 54 5 
percent, compared to 47 4 per- 
cent for the grid version 

In one test, students were given 
both multiple-choice and "grid in ' 
questions and asked to make 
some comparisons Seventy-six 
percent thought the multiple- 



choice questions were easier to 
answer Fifty-seven percent 
thought the gnd-in questions were 
"a better measure" of their ability 
in mathematics Twenty-two 
percent thought multiple-choice 
questions were a better measure, 
and 21 percent saw no differ- 
ence 

In a second stage of the 
project, a more elaborate 
prototype was developed, mak 
mg it possible for students to grid 
fractions, decimals, and whole 
numbers 



ERLC 



6 



5 



While the use of grid-m type 
items seems to be operationally 
feasible, based on research to 
date, James Braswell of ETo states 
that "additional study is needed 
to refine the directions and the 
format and to determine the 
appropriate timing and difficulty 
level for a group of items in this 
format " It is also necessary to 
determine what age groups a 
appropriate for use of the gnu 
format 

Figure 3 



f tgurai Responses 

An alternative to multiple- 
choice questions in science 
testing is to use a drawing which is 
not complete, or on which an 
examinee can mark a location 
(Figure 3) For example, examin- 
ees may be asked to indicate a 
direction by drawing in arrows or 
to show the location of an ana- 
tomical flaw in a diagram of a 



heart. While this approach to free 
response is one that does not 
depend on multiple-choice 
questions, the figures are de- 
signed so that the answers can still 
be machine scored. 

A study using ouch "figural 
response" questions was part of 
the field test for the development 
of the 1990 National Assessment of 
Educational Progress (NAEP). The 
results of fhe field test disclosed 
that the figural response items 
were, in general more difficult 
than their multiple-choice coun- 
terparts. The figural response 
questions were used with samples 
of students in grades 4, 8, and 12, 
with the samples designed to 
represent a broad range of 
student characteristics, such as 
racial/ethnic group membership 
and socioeconomic status. Sixteen 
of the figural response items 

tested were used by NAEP. 

« • « « 

Both these descriptions are 
taken from presentations made at 
tne April 1990, meeting of the 
American Educational Research 
Association. The first, on grids, was 
presented by James Braswell at 
Educational Testing Service, this 
paper is being revised and will be 
in final preparation, available in 
the Fall of 1990. The second, on 
figural response items, was by 
Michael E. Martinez, also of Edu- 
cational Testing Service. His paper 
is titled "A Comparison of Multiple- 
Choice and Constructed Figural 
Response Items " April 1990. w 



On the diagram below, draw where you think the water level would be after 
all the water in the beaker is poured into the U-shaped tube 




7 

9 



Assessing Performance 



Tne terms "problem solving/' 
"critical thinking/' and "higher 
order skills" are becoming perva- 
sive in discussions about educa- 
tion reform. The common belief is 
that such skills are imparted in 
more active learning environ- 
ments, or through "hands-on" 
approaches. Discussions of how 
students learn in this mode lead to 
questions about how to assess 
what they learn. The discussion 
moves in two directions, from 
instruction to the question of how 
to assess results and from the 
design of a performance assess- 
ment that will drive a "hands-on" 
approach to the instruction 
necessary to prepare for it We 
briefly report on some of the 
pioneering work in assessing 
performance at Educational 
Testing Service (ETS) and in the 
state of Connecticut — two of the 
many places where performance- 
based measures are being tried 
and used 

I he NAt p ,_ea r n,r-j t »* 
Doing P r o ; e^' 

The National Assessment of 
Educational Progress (NAEP), 
administered by ETS, has devel 
oped and pilot-tested a variety of 
hands-on science and mathemat- 
ics tasks to be used as prototypes 
for future assessments * The tasks 
required students to think inde- 
pendently about a variety of 
relationships Here are a few 
examples 

At the first level, students are 
asked to classify and sort by 



identifying common characteris- 
tics of plants and animals. 

• At grades 7 and 1 1 , students 
were asked to sort a collection 
of small-animal vertebrae into 
three groups and explain how 
the bones in each grouping are 
alike 

At the next level students are 
given materials and asked to 
observe, infer, and formulate 
hypotheses 

• At grades 3 and 7, students are 
asked to describe what hap- 
pens when a drop of water is 
placed on different types of 
building materials and then 
apply what they have learned 
by hypothesizing what the water 
will do when placed on an 
unknown material 

At the most complex level, stu- 
dents are asked to design and 
conduct complete experiments 

• At grade 1 1 , students are asked 
to design a reliable experiment 
to determine the effects of 
exercise on heart rate Students 
need to identify the variables to 
be manipulated, specify what 
needs to be measured, and 
describe how the measure- 
ments should be made (This 
exercise was included as a 
prototype to assess students 
when actual experimentation in 
a classroom or cssessment 
setting is difficult.) 

The results of the pilot test were 
encouraging. Although managing 
equipment and training adminis- 
trators required ingenuity and 



painstaking effort, the project 
showed that conducting hands- 
on assessment is feasible and 
extremely worthwhile. Professional 
educators were enthusiastic 
students were engaged by the 
tasks; and schoo* staff encour- 
aged further use of these kinds of 
tasks in both instruction.and 
assessment. However, NAEP has 
not yet been funded to actually 
carry out such a "hands-on" 
assessment with a national sample 
of students " # 

Writing Portfolios 

NAEP staff at ETS have designed 
an experimental writing portfolio 
study to be conducted this year 
that will permit an evaluation of 
writing that students produco for 
their school assignments, rather 
than within a testing situation The 
project specifies that teachers 
submit students' writing that has 
been produced in school for an 
assignment. In addition to provid- 
ing more extensive writing samples 
for assessment than is possible in a 
testing situation, the hope is that 
the portfolio materials will provide 
some information about the kinds 
of writing tasks being assigned in 
the nation's classrooms The 
portfolio approach offers an 
opportunity to use the best of 
current knowledge about writing 
theory and instruction in the 
design and implementation of 
more appropriate forms of writing 
assessment. It also implements on 
a national scale some of the 
innovative writing portfolio efforts 



NAEP is administered under a contract with the U S Department of Education s National Center for Education Statistics iNCES) The Lutirntng by Doing Project was funded 
by the National Science Foundation through a grant to NCES 

* in response to the positive results of the pilot study, Learning by Doing was published to describe the tasks tield tested by NAEP It is available from NAEP P 0 Bo< 67 1 0 
Princeton NJ 08541 6710 for $5 c'us $1 50 shipping ana handling learning by Domg was adapted from A Pilot Study of Highvi Ordet Thinking Skills Assessment 
Techniques m Science and Mathematics Fmai Report This two volume 537 page report wh,ch describes NAEP s project m detail and presents all 30 tasks included n the 
pilot study is available tor $35 plus $1 50 shipping and handling from the address above 



8 



7 



currently being undertaken in 
states si jch as Rhode Island and 
Vermont This approach was 
incorporated into the 1990 NAEP 
Assessment of Writing and will be 
continued in the 1992 assessment, 

During the last few years, ETS 
personnel have pioneered a 
variety of portfolio applications 
Particularly noteworthy are the 
contributions of Mary Fowles and 
Roberta Camp In Rhode Island, 
for example, ETS staff worked with 
the state education department, 
the Rhode Island Consortium on 
Writing, and Rhode Island teach- 
ers to develop a portfolio-based 
program to test the validity of the 
state s earlier assessment of third 
grade writing 



Creating curricula and assess- 
ments that provide a much richer 
depiction of how children learn 
music, visual arts and creative 
writing is the goal of Arts PROPEL, 
a project involving the Pittsburgh 
Public Schools ETS, and Harvard 
University's Project Zero The 
project is funded by the 
Rockefeller Foundation PROPEL is 
designing assessments that are 
woven into daily Classroom mstruc 
tion As students produce sketch- 
books and journcls, compile port- 
folios, and complete carefully 
sequenced classroom activities, 
they leave behind a series of 
"footprints" for teachers about 
how they are growing and think- 
ing as artists The project's hope is 
That the more rapid, qualitative 
feedback piovided by these 



assessments will prove more mean- 
ingful to students and teachers 
than current tests. In addition, 
since the exercises double as 
instructional tools, they are helping 
to moaify the curriculum. 

Reflective interviews are an 
example of one technique used to 
allow students to judge them- 
selves. As part of the process o f 
reflecting on the body of their 
work, students can become aware 
of the particular signature they 
give to prints, performances, or 
poems. This becomes evident in 
the case of Connie, a high school 
junior who, in the course of her 
writing, turned out a series of short 
poems. When asked by her 
teacher to reflect on her writing 
the same way she might think 
about poems by Yeats or 
Dickinson, Connie noticed — for 
the first time — that she had a 
style, a cha>actenstic signature, as 
a writer She was able to see how 
consistently she dealt with the hard 
facts and small ironies of everyday 
life by making common objects, 
like mops, speak (Figure 4) 

Figure 4 

A Sample of Connie's Poetry 
Mop 

Woman tall and thin 
With long tangled gray hair 
Must turn her life upside down 
To do her duty 

Hold her breath while washing 

her hair 
Wringing out the dirty water 
Then she goes to her duty 
Again 



C ennecticut 

The state of Connecticut is 
active on several fronts in the 
development of performance 
assessment. The Connecticut 
Multi-State Performance Assess- 
ment Coalition Team (COMPACT) 
Project, sponsored by the Con- 
necticut State Department of 
Education and the National 
Science Foundation, is a collabo- 
ration of the State Departments of 
Education from Connecticut, 
Michigan, Minnesota, New York, 
Texas, Vermont, and Wisconsin, 
the Coalition of Essential Schools 
(CES), The Urban District's Leader- 
ship Coalition of the American 
Federation of Teachers; and 
Project Reiearning. 

Connecticut's Common Core of 
Learning Assessment Project 
assesses high school math and 
science students working together 
in groups to solve problems and 
design and conduct experiments. 
This fits with a view of students as 
knowledge workers, whose job it is 
to construct meaning from what 
they know and the new informa- 
tion they encounter. The teacher's 
role is to be the manager of these 
knowledge workers. 

The Project will use a Core of 
Learning exam, expected to be in 
place in 1991. It is designed to 
force students 1o think before 
answering Here's an abbreviated 
version of one test The problem is: 
How can you really tell which food 
market will save you the most 
money 0 Your assignment- Design 
and carry out a study to answer 



*** Wolf Donnte Palmer Opening up Assessment Educational Leadership December 1 987 January 1 988 

I o» more information about these new projects in Connecticut contact Joan BoyKott Baron Connecticut Common Coif of i earning Cooidmatot Connecticut State 
Department of Education Box 2?19 Hartford CT 06145 



ERLC 



9 



the problem, the Project takes 
*everal steps. 

1 . Write a report that outlines how 
you would solve the problem 
What markets will you com- 
pare? What items? How and 
why did you make your 
choices? What records will you 
keep? How will you analyze the 
data? Keep a log reporting the 
progress of your project. 

2 Form a research group with 3-4 
people. You will meet twice in 
class to compare your plans 
and to develop a final, written 
research approach. Hand it in 
for comments and grading. 

3. Carry out the siudy, with each 
group member doing a portion 
of the work. Hand in a final 
report as a group The report 
should restate the problem that 
was solved, explain how the 
data were collected and 
analyzed, and include graphics 
that will illustrate your conclu- 
sions. 

• • • 

These tew examples provide a 
sampling of new work being 
undertaken at ETS and at other 
places across the country in 
developing alternative methods 
of performance assessment. They 
are illustrative of new attempts to 
go bevond traditional assessment 
methods w 



INTELLIGENT" ASSESSMENT 

"Intelligent Assessment is conceived of as an integration of 
three research lines, each dealing with cognitive performance 
from a different perspective: constructed response testing, 
artificial intelligence, and model-based assessment. This 
integration is envisioned as producing assessment methods 
consisting of tasks closer to the complex problems typically 
encountered in academic and work settings ... It is important 
to stress that the emphasis is on assessment that facilitates 
instruction ..." 

Randy Bennett, Educational Testing Service, 1990 



TESTING TO "FACILITATE SUCCESS" 

"New developments in measurement, especially in concert 
with new developments in cognitive and computer science, 
afford both new reasons and new possibilities for developing 
direct measures of student performance. Performance 
processes would be assessed directly by means of work 
samples or simulations of real-world generic tasks, rather than 
in terms of total scores summarizing the piecemeal information 
provided by a set of discrete test items." 

Samuel Messick, Educational Testing Service, 1938 



Newsweek, January 8. 1990 p 58 



ETS Policy Information Center Publications 



FS Policy Notes Newsletters 
Vol. 1, No. 1, July 1988 

"Black College Faculty. A Dwindling Resource" 
"Introducing the ETS Policy Information Center" 
"Who's Going to Graduate and Professional Schools?" 
"What's Wrong With This Picture'?" 
"State Profiles of Educational Standards Updated" 
"Report Highlights College Minority Retention Programs" 
"New Studies Monitor Talent Flow Into Technical Fields" 

Vol. 1, No. 2, March 1989 — From High School to College 

"Edging Forward: What the SAT Shows About College-Bound Seniors in the 1980s" 
"Starting on the Right Track" 
"High-Achieving Hispanic Srudents" 

Vol. 1, No. 3, June 1989 — Science 

"A Precious Few: Interest of the College-Bound in the Quantitative Sciences" 
"A Straggler's View The U.S. in the World of Science Education" 
"Staying Power: Students Who Persist" 

Vol. 2 t No. 1, October 1989 — The Gender Gap 

"The Gender Gap in Education: How Early and How Large 9 " 
"Scholastic Ability" 

"Sex Differences in Test Performance A Synthesis of Research" 

Vol. 2, No. 2, March 1990 — Public School Choice 

"Choice in Montciair, New Jersey" 
"What the Research Says" 
"Update on State Activity" 

Vol. 2, No. ft, August 1990 — Testing 

"Testing in the Schools" 

"Constructed Response Testing. Some Development Efforts" 
"Assessing Performance" 

(Available while supplies /asP tram [IS Policy Information Center (04-R) Rosedale ,7oad Princeton NJ 0854} ) 



Skills Employers Need: Time to Measure Them? A Policy Information Proposal June 1990 

(Available for $3 50 prepaid from ETS Policy Information Center (04 R) R<jseda\e Road Princeton NJ 08541 ) 

This brief pope i summarizes the skills that employers \A,ant from job candidates and proposes the development 
of an Employment Readiness Profile This profile would provide a barometer of progress in producing a quality 
labor force 



11 



From School to Work. A Policy Information Repor* iooq 



(Available tor $3 Sj prepaid from [IS Publications O f der Service PO Box 6?3o Princeton NJ 08541 0/J6 Order No 204840) 

The U 5 is among the wars/ in the industrial world in helping students who don t go on to college make the 
transition from school to work This report discusses student work during high school, differences between skills 
acquired in the classroom and those needed at the workplace, the information processing skills of high school 
graduates, new efforts to integrate academic ar)d vocational education, and the weakness of linkages be 
tween the school and the workplace 

Choice in Montclair, New Jersey. A Policy Information Paper Beatriz C Cleweil and Myra F Joy, January' 1990 
(Available tor $5 OC prepaid from EJS Pokc\ information CenU 74-/?) Rosedale Road Princeton N J 08541 ) 

Montclair, Nevv Jersey, is an urban school district u jt has achieved success in desegregating its schools 
through a voluntary magiet school plan based on choice To study the effectiveness of Montclair s plan in 
providing racial balance across schools and educational quality and diversity in programs through the use of 
choice, the authors conducted a case study of the district in 1987 and a follow up in the summer of 1989 The 
paper reviews a variety of public school choice programs and describes and evaluates the Montclair model 
The paper outlines the factors contributing to the district s success and offers some recommendations concern 
mg the development and implementation of similar public school choice plans 

What Americans Study. A Policy Information Report 1989 

(Available tor $3 50 prepaid from t?$ Publications Order Sen ice PO Bo\ 6*3o Princeton NJ085*V 6 y 3& Order No S04834) 

Increasing course requirements in key academic subjects has beer) a central theme of educational reform in 
the decade of the 1980s This report provides information on what is being studied and on how thi c has 
changed Over time tor high school graduates and college-bound seniors It also describes course taking pat 
terns for eleventh-, eighth- and fourth-grade students 

State Education Indicators: Measured Strides, Missing Steps. Stephen S Kaagan and Richard J Coley 1989 

(Available ror $3 '6 prepaid from n$ Publications Order Ser^ce PO Box 6736 Princeton NJ 085*11 6/3o Orde' No s3WI?) 

The monograph descibes the central features of indicator systems and the issues that must be aadressed with 
regard to their purposes, applications, and effects at the state and local levels It also provides uose studies of 
state education indicator systems in California Connecticut, New York and South Carolina 

Earning and Learning. Paul [ Barton, March 1989 

(Available tor $j bO prepaid tron* the Natior\it A^essmer <t o* tducationa 1 f\\?ro^ I du< agonal lostina .VnK o Ro*>od no Road *Y/nt eton 
N J 0854 1 0001 Order No ' 1 Wl 0 1 : 

This report explores the relationship between work and student achievement, using information from the 1986 
National Assessment of Educational Progress (NAEP) It relates hours worked per week to student achievement 
on the NAEP proficiency scale for each subject area assessed It describes who works and wno does not 
examines the adjustmer)ts working students make in other activities, charts the growth of the student work 
force and summarizes the results of major research projects that have addressed the effects ot student work on 
school performance 

Information for National Performance Goals for Education A Workbook. November 1989 

(Available' tor $3 50 prepaid fromEfS Policy information Center (04 R) Rosedale Road Pnnceton NJ 0854 / - 

This "workbook was prepared to assist those charged with setting national education pertorma, )ce goals as a 
result of the Education Summit field by President Bush and the nation s governors in Charlottesville V irgima It 
assembles information abuu* current and past educational performance to inform decisions about outcome 
goals for the future 



"2 



) 



A Conference on 
Construction vs. Choice in 
Cognitive 
Measurement 

Sponsored by 
Educational Testing Service 
November 30 & December 1 , 1990 
Topics will include 

Studying Differences Between Multiple 
Choice and Free Response 
• 

Non-Test-Based Approaches to 
Cognitive Assessment 
• 

Jest Development and Scoring Issues 
m 

Ihe Politics of Free Response vs Multiple Choice 

For information contact: 
Ms Tern Sterling • ETS • Mailstop 20-T • 
Princeton. NJ 08541 • (609) 734-15i>0 



ETS Policy Notes is published bv 
tne ETS Policy Information 
Center Educational Testing 
Service Princeton, NJ 08541 
0001 (609) 734-5694 

Director ETS Policy Infc -'nation 
Cente r Paul E Barton 

Editor Richard J Coiey 

Copyright ^990 by Educational 
Testing Service All rights reserved 
Educational Testing Service is an 
Affirmative Action/Equal Oppor 
tumty Employer 



Educational Testing Service ETS 
and if , are registered trade- 
marks of Educational Testing 
Service 



FIRST CLASS 
EDUCATIONAL 
TESTING SERVICE 
U S POSTAGE 
PAID 



1 o 



Appendix 16 



END 

U.S. Dept. of Education 

Office of Education 

Research and 
Improvement (OERI) 



1RIC 



Date Filmed 



March 21, 1991 



